One Practical Algorithm for Both Stochastic and Adversarial BanditsFull Version Including Appendices -.5cm

نویسندگان

Yevgeny Seldin

Aleksandrs Slivkins

چکیده

We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment. Our algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm. The algorithm simultaneously applies the “old” control lever, the learning rate, to control the regret in the adversarial regime and the new control lever to detect and exploit gaps between the arm losses. This secures problem-dependent “logarithmic” regret when gaps are present without compromising on the worst-case performance guarantee in the adversarial regime. We show that the algorithm can exploit both the usual expected gaps between the arm losses in the stochastic regime and deterministic gaps between the arm losses in the adversarial regime. The algorithm retains “logarithmic” regret guarantee in the stochastic regime even when some observations are contaminated by an adversary, as long as on average the contamination does not reduce the gap by more than a half. Our results for the stochastic regime are supported by experimental validation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

One Practical Algorithm for Both Stochastic and Adversarial Bandits

متن کامل

OPTIMIZATION OF A PRODUCTION LOT SIZING PROBLEM WITH QUANTITY DISCOUNT

Dynamic lot sizing problem is one of the significant problem in industrial units and it has been considered by many researchers. Considering the quantity discount in purchasing cost is one of the important and practical assumptions in the field of inventory control models and it has been less focused in terms of stochastic version of dynamic lot sizing problem. In this paper, stochastic dyn...

متن کامل

The Best of Both Worlds: Stochastic and Adversarial Bandits

We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the O( √ n) worst-case regret of Exp3 (Auer et al., 2002b) and the (poly)logarithmic regret of UCB1 (Auer et al., 2002a) for stochastic rewards. Adversarial rewards and stochastic rewards are the two...

متن کامل

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is O ( K √ n log n ) and against stochastic bandits the pseudo-regret is O ( ∑ i(log n)/∆i). We also show that no algorithm with O (log n) pseudo-regret against stochastic bandits can achieve Õ ( √ n) expected regret against adaptive...

متن کامل

DISCRETE SIZE AND DISCRETE-CONTINUOUS CONFIGURATION OPTIMIZATION METHODS FOR TRUSS STRUCTURES USING THE HARMONY SEARCH ALGORITHM

Many methods have been developed for structural size and configuration optimization in which cross-sectional areas are usually assumed to be continuous. In most practical structural engineering design problems, however, the design variables are discrete. This paper proposes two efficient structural optimization methods based on the harmony search (HS) heuristic algorithm that treat both discret...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

One Practical Algorithm for Both Stochastic and Adversarial BanditsFull Version Including Appendices -.5cm

نویسندگان

چکیده

منابع مشابه

One Practical Algorithm for Both Stochastic and Adversarial Bandits

OPTIMIZATION OF A PRODUCTION LOT SIZING PROBLEM WITH QUANTITY DISCOUNT

The Best of Both Worlds: Stochastic and Adversarial Bandits

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

DISCRETE SIZE AND DISCRETE-CONTINUOUS CONFIGURATION OPTIMIZATION METHODS FOR TRUSS STRUCTURES USING THE HARMONY SEARCH ALGORITHM

عنوان ژورنال:

اشتراک گذاری